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(5)  to  be  used  as  a  surrogate  for  the  original  model  in  a 
larger  scale  model  or  simulation. 

The  general  methodology  may  be  termed  the  "state  space"  approach 
to  modeling,  where  the  term  state  refers  to  the  values  of  the  input 
variables.  In  the  next  two  sections  of  this  report,  recent  applications 
of  this  approach  to  other  problems  are  described.  The  discussion  then 
turns  to  the  application  of  the  methodology  to  the  evaluation  of  the  re¬ 
sults  obtained  from  the  AMSWAGl>3,  a  combat  simulation. 

1.2  Previous  Applications  of  State  Space  Approach  to  Modeling 

Simulations. 

In  this  section  a  brief  review  of  the  application  of  state 
space  methodology  (SSM)  and  pattern  recognition  (PR)  to  several  DoD 
problems  is  presented.  In  the  first  application  we  describe  the  use  of 
SSM  and  PR  in  obtaining  trauma  indices  based  on  physiological,  biochemical, 
and  anatomical  measurements  taken  from  trauma  patients  at  several  medical 
centers. 


The  second  application  consists  of  rationales  and  computer  tech¬ 
niques  for  characterizing,  clustering,  and  screening  chemical  compounds 
for  potential  biological  activity  and  for  relating  structural  properties 
of  compounds  to  pharmacological  activities.  Finally,  the  application  of 
the  approach  to  estimating  the  output  of  a  weapon  system  performance 
model  is  summarized. 

1.2.1  Trauma  Indices.  Considerable  effort  has  been  devoted 
to  the  development  of  a  broad  class  of  trauma  indices  covering  a  range 
of  patient  conditions.  The  original  work  was  begun  in  1973,  a  joint 
venture  among  the  Biophysics  Branch  of  the  Chemical  Systems  Laboratory, 
AMSAA,  and  the  Maryland  Institute  for  Emergency  Medical  Services  (MIEMS). 
Efforts  have  continued  in  MIEMS  and  also  with  surgeons  at  Washington 
Hospital  Center  and  Monmouth  Medical  Center. 

The  indices  evolved  from  pattern  recognition  analyses  of  over 
60  physiological  and  biochemical  variables.  Each  of  the  individual 
variables,  and  many  combinations  of  variables,  were  evaluated  using 
methods  which  assess  their  capability  to  independently  and  correctly 
predict  patient  outcome,  i.e.,  survival  or  death. 

On  the  basis  of  these  computations,  advice  from  clinicians,  and 
practicality,  the  indices  were  derived.  In  each  case  the  index  --  which  is 
a  function  of  one  or  more  physiological  and  biochemical  variables  --  was 
used  to  characterize  the  "state"  of  the  patient.  For  each  of  the  indices, 
a  probability  of  mortality  curve  was  obtained  by  fitting  the  data  to  a 
logistic  model.3  Approximate  95  percent  confidence  bounds  on  the  curve 
were  computed  by  the  method  of  Kendall  and  Stuart.4 

For  example,  a  Respiratory  Index  (RI)  was  developed  as  an 
indicator  of  a  trauma  patient's  respiratory  state4.  A  Renal  Index  ( RE  I ) 
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was  developed  as  an  adjunct  method  for  evaluation  of  renal  (kidney) 
failure  and  to  give  indications  of  the  need  for  early  hemodialysis  in 
trauma  patients. ^  An  Acute  Trauma  Index  (ATI)  and  a  Blunt  Anatomical 
Index  (BAI)  were  developed  to  characterize  patient  status  at  the  time  of 
admission  to  a  hospital.  ,  '  A  Triage  Index  has  also  been  developed.^ 

It  is  a  validated  technique  for  an  early,  rapid,  noninvasive,  accurate 
method  for  estimating  injury  severity,  permitting  appropriate  matching 
of  trauma  victims  with  available  therapeutic  resources  as  a  means  of 
reducing  mortality  and  morbidity. 

Mortality  correlations  are  available  for  all  of  the  indices 
individually  and  for  certain  combinations  of  indices  in  order  to  accurately 
characterize  the  illness  or  injury  to  mil ti pie  body  subsystems  or  to 
characterize  the  physical  and  biochemical  states  of  the  patient. 

All  of  the  indices  are  currently  being  used  in  various  appli¬ 
cations  by  researchers  and  clinicians  at  the  MIEMS,  Washington  Hospital 
Center,  Monmouth  Medical  Center  and  hospitals  throughout  the  United  States. 

The  applications  of  these  indices  include  patient  triage  (which 
could  be  most  useful  in  military  combat  situations),  prognosis  (at  the 
time  of  hospital  admission  and  throughout  the  patient  stay)  and  trackiny 
of  patient  condition;  initiation,  assessment  and  communication  of  thera¬ 
pies;  and  general  evaluation  of  care.^>  9-17 

1.2.2  Screening  and  Structure-Acti vity  Studies  of  Chemical 
Compounds .  During  the  past  several  years,  a  multidisciplinary  group 
including  biochemists,  mathematicians,  statisticians,  and  conputer  scien¬ 
tists  from  several  elements  of  the  Chemical  Systems  Laboratory,  ARRADCOM, 
has  been  applying  pattern  recognition  techniques  to  the  screening  of 
chemical  compounds  and  to  modeling  the  structure- pharmocologi cal  rela¬ 
tionships  of  several  classes  of  compounds. 

The  methodological  developments  include  rationales  and  com¬ 
puter  techniques  for  characterizing,  clustering,  and  for  screening 
chemical  compounds  for  potential  biological  activity,  and  for  relating 
structural  properties  of  compounds  to  pharmacological  activities. 

In  all  of  these  applications  a  compound  is  characterized  by  a 
property  vector  X  =  (xj,...,  xn)  where  x-j  is  a  number  which  corresponds 
to  the  value  of  the  i-th  property.  A  property  may  be  any  characterise c 
which  is  believed  to  have  some  relationship  to  a  pharmacological  activity 
of  interest;  it  may  be  a  physiochemical  property  such  as  molecular 
weight  or  partition  coefficient;  a  steric  property  such  as  bond  radius  or 
an  interatomic  distance;  or  an  arbitrarily  chosen  structural  property  such 
as  the  number  of  oxygen  atoms  or  number  of  occurrences  of  a  given  sub¬ 
structure. 


In  a  number  of  applications  clustering  has  been  used  as  a  basis 
for  screening  a  set  of  candidate  drugs  for  testing.  In  these  applications, 
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^LJ?aidi!?ensional  Pr?Perty  vectors  of  the  compounds  are  clustered  and 
several  drugs  are  selected  from  each  cluster  for  testing.  The 

objective  is  to  select  a  small  set  of  drugs  which  are  "representative" 
of  all  candidates.  The  methodology  is  based  on  the  premise  that  drugs 
which  are  close  to  one  another  in  property  vector  space  will  have 
similar  activities.  As  activity  data  become  available,  the  measure  of 
distance  between  drugs"  can  be  altered  to  reflect  the  relative  "prediction 
of  activity"  capabilities  of  individual  properties  of  the  property  vector 
and  an  index  to  predict  activity  can  be  constructed. 


1.2.3  Response  Surface  Methodology  Application  to  the  Laser 
Designator  Weapon  System  Simulation  (LDWSS)7  LDWSS  is  a  detailed  sto- 
chastic  simulation  model  which  was  developed  at  MICOM  in  order  to  guide 
the  development  and  aid  in  the  evaluation  of  laser  guided  weapon  systems 
such  as  COPPERHEAD  and  HELLFIRE.  Because  of  the  relatively  high  cost  of 
exercising  the  LDWSS,  AMSAA  is  conducting  an  investigation  to  generate, 
through  response  surface  or  regression  techniques,  a  mathematical  model 
for  estimating  the  results  of  the  LDWSS.  The  effort  has  produced  a 
low-cost,  quick-response  method  for  estimating  LDWSS  results  which  could 
be  included  in  combat  simulations  which  consider  semi-active,  laser-guided 
weapons  such  as  COPPERHEAD  or  HELLFIRE  and  yet  cannot,  due  to  time  con¬ 
straints,  include  LDWSS  as  a  subroutine  for  making  effectiveness  compu¬ 
tations. 


Data  that  were  extracted  from  336  LDWSS  runs  simulating  COPPERHEAD 
firings  have  been  subjected  to  a  cluster  analysis.  This  process  aids  in 
defining  void  regions  in  the  input  data  space  and  magnifies  the  benefits 
obtained  from  the  regression  analysis.  After  using  an  auxiliary  program 
to  sort  the  cases  according  to  cluster  membership  and  to  calculate  within- 
cluster  statistics,  regression  models  were  developed  for  each  of  the 
clusters  as  well  as  for  the  complete  set  of  336  cases. 

Some  recently  generated  LDWSS  COPPERHEAD  cases  were  obtained  to 
validate  the  regression  models  developed.  The  estimates  of  probability 
of  hit  obtained  from  those  models  agreed  rather  well  with  those  generated 
by  the  LDWSS.  Current  efforts  are  focused  on  analyzing  several  hundred 
recently  generated  LDWSS  COPPERHEAD  cases  to  be  added  to  the  somewhat  small 
data  base  of  336  cases. 

2.  DISCUSSION 

The  previous  section  briefly  described  a  few  applications  of 
the  state  space  approach  to  the  modeling  of  "systems";  human,  chemical 
and  one  simulation.  The  objective  of  this  report  however,  is  to  describe 
the  application  of  this  methodology  to  a  combat  simulation  frequently 
used  by  AMSAA  to  evaluate  tactics  and  materiel.  AMSWAG  is  briefly  de¬ 
scribed  in  the  following  section. 
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2.1  Description  of  the  AMSWAG  Model . 


AMSWAG  is  a  coirputeri zed  combat  simulation  which  describes  a 
typical  battalion  level  attack/defense  situation.  The  model  is  determin¬ 
istic,  time-sequenced  and  is  based  on  second  order  differential  equations. 

The  defending  force,  normally  a  reinforced  coup  any,  is  deployed 
in  a  fixed  position.  The  attacking  force,  normally  a  battalion,  moves 
along  predefined  paths  of  advance.  Each  defender  and  attacker  unit 
consists  of  a  homogeneous  group  of  weapons,  such  as  M60A3  tanks. 

AMSWAG  "conducts"  the  battle  in  uniform  time  steps  of  ten 
seconds  each.  The  primary  processes  considered  during  each  interval  are 
target  acquisition,  target  prioritization,  target  allocation,  fire 
suppression,  attacker  dismount,  and  target  attrition.  Attrition  is  due 
to  direct  fire,  indirect  fire  (artillery  and  mortar),  and  minefields, 
although  it  must  be  admitted  that  these  threats  are  played  less  than 
perfectly.  At  the  end  of  each  time  step,  the  number  of  survivors  in 
each  unit  is  determined  by  subtracting  the  attrition  to  the  unit;  the 
amnunition  levels  are  also  depleted  appropriately .  AMSWAG  also  provides 
a  description  of  total  vehicle  and  personnel  losses  on  each  side,  vehicle 
exchange  and  force  ratios,  status  of  surviving,  units,  and  killer-victim 
scoreboards  (number  of  kills  as  a  function  of  weapon  type  versus  weapon 
type).  The  normal  stopping  rule  for  an  AMSWAG  battle  is  a  specified 
level  of  losses,  usually  60  percent  for  either  the  attacker  or  defender. 

2*2  Application  of  State-Space  Approach  to  Modeling  AMSWAG. 

We  begin  the  application  by  choosing  the  variables  which  will 
be  used  to  characterize  the  initial  "state"  of  the  AMSWAG  engagement. 

This  was  achieved  subjectively  based  upon  experience  with  the  model. 

The  data  which  formed  the  basis  for  analysis  thus  consisted  of  one  "state" 
vector  per  game  and  the  associated  values  of  one  or  more  output  measures 
of  interest.  The  next  step  was  to  determine  the  relationship  between 
the  state  variables  and  the  output  variables. 

Two  analyses  of  AMSWAG  results  were  performed.  The  first  used 
the  data  from  35  AMSWAG  runs  performed  to  support  the  Engineer  Study 
Phase  I.  The  second  analysis  utilized  155  AMSWAG  runs,  including  120 
new  cases  from  the  Phase  II  Engineer  Study. 

The  following  section  describes  the  input  variables  used  in  the 

analysi s. 


2.2.1  Input/Output  Variables  of  Interest.  State  variables 
used  in  the  analyses  of  the  35  runs  were,  defender's  exposure  (E),  mine¬ 
field  density  (M),  time-frame  (T),  preparation  time  (P),  and  the  at¬ 
tacking  forces  countermeasure  (C).  Each  of  these  state  variables  is 
described  in  the  following  paragraphs. 
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The  exposure  of  a  defender  target  (E)  is  represented  by  a 
coded  variable  whose  values  range  from  1  to  4.  An  index  value  of  1 
means  that  the  defender  is  fully  exposed  (FE).  Index  values  of  2,  3,  or 
4  mean  the  defender  is  in  1/3,  2/3,  or  full  hull  (HD)  defilade  respec¬ 
tively.  Exposure  obviously  affects  the  probability  that  a  defender  will 
be  hit  by  enemy  fire  and  hence  his  rate  of  attrition. 

Minefield  density  (M)  is  a  coded  variable  which  describes  a 
combined  density  of  both  remote  emplaced  (RAAMS)  and  ground  emplaced 
(GEMSS)  systems.  The  density  was  a  measure  of  mines  per  square  meter. 

Time-frame  (T)  is  also  a  coded  variable  which  takes  on  one  of 
the  values  1,  2,  or  3,  which  are  descriptive  of  the  current  (1978), 
future  time-frame  (1982),  or  future  time-frame  with  XM1  respectively. 

This  variable  is  important  as  much  of  the  detailed  weapon  performance, 
vulnerability,  etc.  depend  on  time-frame.  In  a  sense  then,  time-frame 
is  a  surrogate  for  those  detailed  data. 

Preparation  time  (P)  reflects  the  amount  of  warning  time 
available  to  the  defender  force  of  an  impending  attack.  During  this 
period  the  defender  allocates  his  resources  to  either  the  preparation  of 
weapon  positions  or  the  implacement  of  barriers.  The  preparation  of 
weapon  positions  is  equivalent  to  gaining  decreased  exposure.  The 
emplacement  of  barriers  includes  minefields  and  nondestructive  barriers 
such  as  an  abatisse  or  tank  ditches.  Preparation  time  takes  on  the 
values  1,  2,  3  and  4  for  0  hours,  6  hours,  24  hours,  and  greater  than  24 
hours  warning  times  respectively. 

Attacking  force  countermeasure  (C)  is  a  coded  variable  which 
describes  the  tactics  used  by  the  attacker  upon  encountering  a  minefield. 
The  values  of  C  range  from  1  to  4,  indicating  a  bul 1 -through,  line-breach, 
column-breach,  or  delay  tactic  respectively.  During  a  bull-through,  the 
attacker  proceeds  through  the  minefield  as  if  there  were  no  minefield. 

In  a  line-breach,  individual  columns  follow  a  plow  vehicle  through  the 
minefield.  When  employing  a  column-breach,  the  individual  attacker  units 
on  different  routes  through  the  minefield,  converge  into  one  column 
behind  a  plow  vehicle.  After  exiting  the  minefield  units  again  return 
to  their  original  routes.  During  a  delay  tactic,  the  attacker  either 
delays  his  advance  before  or  from  within  a  minefield. 

Output  variables  of  interest  included:  attacker  win  or  loss, 
time  till  the  end  of  the  game;  and  final  force  ratios  (attacker/defender) 
for  both  vehicles  and  personnel. 

Attacker  win  or  loss  is  a  coded  output  variable  determined  from 
the  AMSWAG  game  results.  An  attacker  win  is  attained  when  the  defending 
force  has  been  attrited  to  60  percent  of  its  original  strength.  Similarly, 
an  attacker  loss  is  defined  to  occur  when  the  attacking  force  has  been 
attrited  to  a  similar  degree.  The  variable  is  coded  1  or  0  for  an  attacker 
win  or  loss  respectively. 
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2.2.2  Application  of  Approach  to  35  AMSWAG  Games.  The  3b 
simulation  runs  of  the  Engineering  Study  Phase  I  provided  the  basis  for 
the  analysis  described  here.  The  following  5  variables  were  used  to 
describe  the  state  for  each  of  these  35  games: 

(1)  minefield  density  (M), 

(2)  time-frame  (T), 

(3)  preparation  time  (P), 

(4)  attacking  force- countermeasure  (C),  and 

(5)  exposure  (E). 

2 .2 .2.1  PER  Methodology  (Phase  1  Study).  The  first  output 
variable  analyzed  was  attacker  win  or  loss,  which  was  coded  1  or  0  respec¬ 
tively.  The  value  of  each  of  the  state  variables  for  predicting  outcome 
was  determined  using  the  PER  methodology.  When  both  the  dependent  and 
independent  variables  are  continuous,  correlation  is  often  used  to  make 
such  assessments.  However,  in  this  case  the  dependent  variable,  outcome, 
is  two-valued,  hence  the  use  of  correlation  is  inappropriate.  Instead 
other  measures  based  upon  information  theoretic  concepts  were  used. 

These  measures  may  be  symbolized  by  the  acronym  PER. 

In  this  application  P  stands  for  the  a  priori  probability  of 
attacker  win,  l.e.,  the  percent  of  the  35  cases  which  were  "won"  by  the 
attacker.  Then  for  any  state  variable  x  we  can  compute  quantities  Ex 
and  Rx  which  are  respectively  called  the  information  gain  and  relative 
information  gain  provided  by  x.  Appendix  A  gives  a  thorough  discussion 
of  PER.  It  suffices  to  say  here  that  the  value  of  Ex  depends  upon  P 
(namely  0  Ex<  2P (1-P) )  and  that  to  remove  this  dependence  Rx  is 
defined  as: 


R„  = -  ,  hence  0<  Ry  <1. 

2P (1-P )  “  ” 

The  larger  the  value  of  Rx,  the  better  the  variable  x  is  at  predicting 
outcome. 


For  the  35  AMSWAG  cases  considered: 

Number  of  games  won  by  attacker 

P  =  25/35  =  .71  =  _ 

Total  number  of  games 


Table  1  lists  the  input  variables  in  relation  to  P,  Ex,  Rx. 
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TABLE  1  RESULTS  OF  PER  METHODOLOGY  ON  35  AMSWAG  CASES 


P 

Ex 

Rx 

EXPOSURE  (E) 

.71 

.28 

.68 

MINEFIELD  DENSITY  (M) 

.71 

.25 

.61 

TIME-FRAME  (T) 

.71 

.26 

.63 

PREPARATION  (P) 

.71 

.20 

.49 

COUNTERMEASURE  (C) 

.71 

.16 

.34 

Results  indicated  that  for  the  25  engagements  out  of  35  where 
the  defender's  exposure  (E)  was  in  ary  state  but  full  hull  defilade  and 
the  time-frame  (T)  was  current,  the  attacker  always  won.  The  attacking 
force  lost  only  when  the  defender  was  in  a  state  of  full  hull  defilade 
in  the  future  time-frame. 

Referring  to  Table  1,  it  should  be  noted  that  the  variables  ex¬ 
posure  (E)  and  time-frame  (T)  provide  the  highest  values  of  Ex  and  Rx. 
Listed  below  are  the  results  of  a  PER  assessment  made  of  the  variables 
exposure  (E)  and  time-frame  (T),  when  considered  jointly: 

E  Ex  Rx 

(Exposure,  time-frame)  .71  .39  .95 

For  the  35  case  study  it  was  found  that  the  5  initial  input  variables 
could  essentially  be  reduced  to  2,  i.e. ,  exposure  and  t i me -frame,  in 
order  to  predict  win  or  loss  with  a  high  degree  of  certainty.  This  is 
only  true  for  this  limited  data  base  and  should  not  be  construed  as  a 
general  result. 

In  the  followi ng  section  a  more  general  approach  to  predicting 
outcome  is  explained  and  applied  to  our  35  game  data  base. 

2. 2. 2. 2  Regression  on  Win  or  Loss  (Phase  1  Study).  The 
logistic  function,  which  is  a  nonlinear  functional  f o rm,  i s often  used 
to  obtain  estimates  of  a  probability  from  multiple  inputs  when  outcomes 
are  two-valued  such  as  win  or  loss. 

The  goal  of  the  logistic  function  is  to  estimate  the  coeffi¬ 
cients  of  a  polynomial  A(x)  (where  A (x)  =  l\0+kiX  1+^*1,+ . A^XJ 

which  can  be  used  to  predict  a  probability,  in  this ‘case  of  "attacker 

win,"  P(X).  Aj ,  A2 . Am  are  the  coefficients  associated  with  the 

independent  variables  Xj,  X2,  ...,  Xm. 
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In  general,  vectors  comprised  of  the  initial  state  variables 
and  their  respective  game  outcomes  are  entered  into  an  iterative  least- 
squares  solution. This  process  continues  recursively  until  coefficients 
are  obtained  for  the  input  variables  to  provide  estimates  of  the  probability 
of  the  attacker  wi n. 

With  the  estimated  coefficients,  P(X)  is  determined  as: 

1 


1  +  e  -MX)  . 


See  Figure  1  for  a  graphical  .representation  of  this  function. 

In  order  to  demonstrate  the  use  of  the  logistic  function  for 
the  33  game  study,  a  simple  linear  model  was  chosen  for  A(x),  in  which 
each  of  the  five  variables  entered  linearly.  The  following  polynomial 
resulted  for  the  35  AMSWAG  cases: 

A(x)  =  711  -  94  (exposure)  -  .0039  (minefield  density)  -  263  (time- 

frame)  -  10.4  (preparation  time)  +  593  (countermeasure) 

Table  2  lists  the  relationships  of  the  independent  variables,  A(x) ,  and 
the  response  P(x),  i.e. ,  probability  of  an  attacker  win,  for  the  35 
AMSWAG  games  on  which  the  analyses  were  based.  The  coefficients  listed 
above  represent  the  importance  of  each  variable  to  outcome.  It  can  be 
seen  that  the  attacker's  countermeasure  (C)  is  most  important  to  his 
winning.  Exposure  (E)  and  time-frame  (T)  are  most  beneficial  to  the 
defender. 


It  may  be  noted  that  while  C  has  the  largest  coefficient,  it 
has  the  smallest  relative  information  gain  (Table  1)  of  any  variable. 

This  anomaly  could  be  caused  by  several  factors.  First,  the  PER  approach 
considers  each  variable  in  isolation,  while  the  regression  considers  all 
variables  simultaneously,  adjusting  coefficients  so  as  to  achieve  the 
best  fit.  Secondly,  as  noted  earlier  in  this  section,  quite  good  prediction 
could  have  been  achieved  using  only  the  variables  Exposure  and  Time  Frame. 
While  all  five  variables  were  used  in  the  model  for  demonstration  purposes, 
this  does  permit  the  adjustment  of  coefficients  on  all  variables  to 
achieve  the  best  possible  fit,  in  this  case  causing  some  unexpected  results. 

From  Table  2  it  should  be  clear  that  for  the  35  AMSWAG  cases, 
the  logistic  function  provided  a  perfect  prediction  of  final  outcome. 

The  probability  of  the  attacker  winning  is  a  clear  win  or  loss,  indexed  1 
or  0  respectively.  Figure  2  is  a  graphical  representation  of  how  the 
logistic  function  appears  for  the  35  case  study. 

The  logistic  function  was  also  used  to  predict  the  outcome  for 
games  not  yet  played,  but  which  may  be  described  by  allowing  the  input 
variables  to  vary  within  their  respective  ranges.  The  logistic  function 
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FIGURE  1  GRAPHICAL  REPRESENTATION  OF  THE  LOGISTIC  FUNCTION 


P(x) 


1 

1+  e  _A(x) 


Where  A(x)  is  a  polynomial. 

(1)  In  its  simplest  form  with  A(x)  =  x,  we  have 

P(x)  =  _i -  . 

1  +  e"x 

(2)  A(x)  could  be  a  linear  combination  of  several  variables 
A  (x)  =  A0+  A1X1+A2X2+...+AnXn. 

(3)  A(x)  could  be  a  polynomial  of  one  variable  - 
A (x )  =  A0+A1X+A2X2+.  ..+AnXn 

P(x) 
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Figure  2.  A  Graphical  Representation  of  the  Logistic  Function  for 
35  Cases. 


Regression  coefficients  for  the  initial  state  inputs  with  respect 
to  outcomes  are  provided  in  Table  4,  again  assuming  the  same  simple  linear 
form  of.  the  variables  that  was  used  to  predict  attacker  win  or  loss. 

The  results,  presented  in  Table  4,  indicate  that  exposure  (E) 
of  the  defender  and  time-frame  (T)  are  most  important  to  response. 

These  results  substantiate  those  presented  in  the  PER  and  correlation 
methodologies. 


TABLE  4  REGRESSION  ON  35  AMSWAG  GAMES 
CONSTANT _ E  M  T  P  C 


24.78  -2.19  11.38  -3.05  -0.40 


0.28  -0.01  -0.44  -0.11  -0.08 


0.24  -0.01  -0.44  -0.09  -0.07 


TEOG  62.75 

RED/BLUE 

(VEHICLES)  3.92 

RED/BLUE 

(PERSONNEL)  4.38 


Tlie  signs  of  the  coefficients  are  also  informative.  For  example, 
consider  either  final  force  ratio  for  vehicles  or  personnel  (attacker/ 
defender).  As  the  index  of  exposure  increases  (i.e. ,  the  defender  is  in 
a  more  defilade  position),  it  has  a  negative  or  detrimental  affect  on 
the  final  force  ratio.  An  analogous  situation  applies  to  TEOG.  Since 
both  coefficients  for  exposure  (E)  and  time-frame  (T)  are  positive,  an 
increase  of  their  indices  would  increase  the  time  which  a  game  lasts. 

The  Erms  (error  root  mean  squared)value  for  TEOG  and  the 
final  force  ratios  are  listed  below.  It  should  be  noted  here  that  TEOG 
varied  in  ten-second  increments  over  a  range  of  [0,180].  These  ERMS 
values  for  the  3  outcomes,  represent  good  fits. 


RMS 


TEOG 


15.56 


VEHICLES  (RED/BLUE) 


0.12 


PERSONNEL  (RED/BLUE) 


0.16 


2. 2. 2. 4  Summary  (Phase  1  Study).  It  has  been  shown  that  the 
methodologies  of  PER  and  regression  can  be  applied  to  variables  describing 
the  initial  state  for  the  35  games  studied  in  predicting  outcomes  of 
i nterest. 
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PER  and  the  logistic  function  were  useful  tools  for  pre¬ 
dicting  win  or  loss.  PER  provided  information  gain  and  relative  infor¬ 
mation  gain  for  each  variable.  The  logistic  function  provided  a 
predictor  for  the  probability  of  attacker  win  over  the  entire  "state- 
space.  " 


Standard  linear  regression  techniques  were  useful  (when  the 
dependent  variable  is  continuous)  in  predicting  the  outcomes  of  game 
length  and  final  force  ratios  from  the  initial  state. 

2.2.3  Application  of  Approach  to  an  Expanded  Data  Base  (155 
Games ) .  As  mentioned  in  previous  sections,  120  additional  AMSWAG  runs 
from  the  Engineering  Study  Phase  II  were  added  to  the  data  base.  The 
total  number  of  games  then  under  study  was  153. 

The  same  approach  applied  previously  is  used  here,  that  is,  the 
methodologies  of  PER,  logistic  function,  and  linear  regression.  Since 
the  data  base  has  grown  substantial ly ,  cluster  analysis  has  been  added 
as  a  tool  for  analyses.  Its  application  will  be  described  subsequently. 

The  variables  of  initial  engagement  range,  visibility,  and 
initial  force  ratio  were  present  in  the  120  new  cases  and  so  the  state 
vectors  for  the  original  35  games  were  expanded  to  include  them  as  well. 

Initial  engagement  range  (I)  was  preset  at  2.5,  2,0,  1.5,  or 
1.0  kilometers.  Initial  force  ratio  (IFR)  was  trie  ratio  of  the  attacking 
force  to  the  defending  force.  Visibility  was  either  restricted  or  un¬ 
restricted  and  indexed  1  or  2  respectively.  Criterion  for  a  restricted 
visibility  was  that  visual  acquisition  was  limited  to  a  range  of  1.0 
kilometers. 


2. 2. 3.1  PER  and  Correlation  Study  (Combined  Study).  Table  5 
lists  P(Attw)  ,  the  independent  input  variables  and  their  associated 
values  of  Ex  (Information  gain  of  a  variable  x) ,  and  Rx  (Relative 
information  gain). 

It  can  be  seen  that  the  variables  of  minefield  density  (M), 
initial  engagement  range  (I),  preparation  time  (P),  initial  force  ratio 
(IFR),  time-frame  (T),  and  defender's  exposure  (E)  have  the  highest 
associated  values  of  Rx.  Neither  countermeasure  (C)  nor  visibility  (V) 
seem  to  be  strongly  related  to  outcome. 

Significant  differences  in  Rx  can  be  observed  for  certain 
variables  between  the  original  and  expanded  data  base.  For  example,  C 
had  an  Rx  value  of  .34  for  the  35  game  study  but  of  .05  for  all  155  cases. 
This  drop  is  due  to  the  lack  of  variation  present  in  C  in  the  expanded 
base,  which  prevents  it  from  being  a  good  predictor  of  outcome. 

In  conjunction  with  PER,  correlations  were  obtained  for  all 
input  variables.  Table  6  gives  the  correlation  matrix  of  the  eight 
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TABLE  5  PER  RESULTS  ON  COMBINED  155  GAMES 


P(Attw) 

Ex 

Rx 

EXPOSURE  (E) 

0.66 

0.14 

0.31 

MINEFIELD  DENSITY  (M) 

0.66 

0.27 

0.59 

TIME-FRAME  (T) 

0.66 

0.14 

0.32 

COUNTERMEASURE  (C) 

0.66 

0.02 

0.05 

VISIBILITY  (V) 

0.66 

0.04 

0.09 

INITIAL  FORCE  RATIO  (IFR) 

0.66 

0.16 

0.35 

INITIAL  RANGE  (I) 

0.66 

0.23 

0.51 

PREPARATION  TIME  (P) 

0.66 

0.22 

0.50 

TABLE  6 

CORRELATION  ON  8 
155  CASE  COMBINED 

INPUT 

STUDY 

VARIABLES 

E 

M 

T 

C 

V 

IFR 

I 

P 

1.0 

.01 

.05 

-.05 

.21 

-.03 

.00 

.04 

1.00 

-.07 

.18 

.03 

-.07 

-.08 

.91 

1.00 

-.28 

.34 

-.40 

.43 

-.14 

1.00 

.08 

-.16 

-.17 

.24 

1.00 

.04 

-.09 

.04 

1.00 

.44 

-.13 

1.00 

-.16 

1.00 
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initial  state  variables  used.  It  can  be  seen  that  a  high  correlation 
exists  between  minefield  density  (M)  and  preparation  time  (P).  It  wa s 
determined  that  the  exclusion  of  one  of  these  variables  would  not  affect 
the  results.  The  variable  minefield  density  was  excluded,  thus  reducing 
the  initial  state  space  to  seven  variables. 

2. 2. 3. 2  Logistic  Function  (Combined  Study).  The  parameters 
for  the  logistic  function  in  which  each  variable  entered  the  polynomial 
linearly  were  also  obtained  for  all  of  the  155  engagements  in  the  combined 
study.  The  polynomial  is  given  below: 

A  (x)  =  5.9405  -  .6022(E)  -  .9577(T)  +  .1741(C) 

-  . 9774( V)  +  1 . 398 1 ( IFR)  -  .1419(1)  -  .9718(P). 

This  function,  A(x),  was  used  to  calculate  P(probabil ity  of  an 
attacker  win)  for  the  155  cases.  There  were  27  mi  sclassi fications  ob¬ 
served,  where  a  mi sclassi fication  is  defined  as:  any  game  where  the 
absolute  value  of  the  predicted  probability  of  win  minus  the  actual 
outcome,  a  1  or  0,  is  greater  than  0.5.  ‘Tore  simply  stated,  a  gnine  is 
classified  as  an  attackerwin  if  the  logistics  function  prediction  of 
the  probability  oF  attacker  win  exceeds  0.5;  otherwise,  it  is  classified 
as  a  defender  win. 

It  may  be  noted  that  there  is  a  17.4%  probability  of  ni sclassi  Fi¬ 
cation  associated  with  this  "overall"  logistic  function  for  all  155 
cases.  This  result  may  reflect  the  inability  of  a  single  model,  particularly 
a  non-phenomenol ogical  one,  to  make  accurate  predictions  of  outcome  over 
the  large  regions  of  a  high  dimensional  space.  In  later  sections  of 
this  report  cluster  analysis  and  further  modeling  within  clusters  are 
shown  to  reduce  the  27  mi  scl assi ficati ons  to  three. 

2. 2. 3. 3  Cluster  Analysis  (Combined  Study).  Since  the  data 
base  has  increased  from  35  AMSWAG  cases  to  155,  and  the  dimensionality 
oF  the  state  space  has  increased  from  five  to  seven,  the  ability  of  a 
single  model  to  make  accurate  predictions  over  the  entire  state  space 
becomes  limited.  It  would  be  useful  to  derive  more  locally  restricted 
yet  more  accurate  predictive  models.  Cluster  analyses  is  a  set  of  mathe¬ 
matical  methods  for  separating  large  numbers  of  vectors  into  smaller 
compact  subsets  called  clusters.  In  the  particular  clustering  method 
used  in  this  study  13,  19,  input  or  "state"  vectors  "close"  to  one 
another  in  a  Euclidean  Distance  sense  are  grouped  into  clusters.  The 
use  of  parameters,  to  control  the  lumping  and  splitting  of  clusters  on 
successive  iterations,  helps  to  attain  the  desired  degree  of  cluster 
separation  and  cluster  cmnpactness. 

Using  this  technique,  clusters  were  obtained  For  the  155  case 
study.  An  a  priori  probability  P,  oF  an  attacker  win  is  estimated  by  the 
relative  percentage  of  attacker  wins  in  that  cluster.  Table  7  lists 
clusters  according  to  class  (Attacker  '.Tin  or  Loss)  and  gives  the  P  (prob¬ 
ability  of  attacker  win)  within  each  cluster. 
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TABLE  7  CLUSTER  MEMBERSHIP  TABLE 


CLUSTER  # 

(ATTACKER) 

LOSS  WIN 

P  (PROB. 

ATTACKER  WIN) 

1 

12 

15 

0.56 

2 

0 

4 

1.0 

3 

% 

0 

4 

1.0 

4 

0 

4 

1.0 

5 

1 

16 

0.94 

6 

4 

0 

0.0 

7 

4 

0 

0.0 

8 

0 

5 

1.0 

9 

11 

12 

0.52 

10 

5 

23 

0.82 

11 

16 

11 

0.41 

12 

0 

8 

1.0 

It  can  be  seen  from  Table  7 

that 

clusters 

1,9,  10,  and  11 

candi dates 

for  further 

modeling,  since 

i  all 

other  clusters  have  littl 

variability  of  outcome  occurring  within  them,  and  hence,  require  no 
formal  modeling  in  order  to  make  future  predictions  of  outcome  for  games 
whose  input  states  reside  in  those  clusters. 

The  methodologies  of  PER  and  linear  correlation  were  applied 
to  the  games  within  each  cluster.  These  methodologies  assisted  in  choosing 
the  best  initial  state  variables  (within  clusters)  to  be  included  in  the 
appropriate  regression  model.  That  is,  in  different  regions  of  the  state 
space  (i.e.,  in  different  clusters)  different  variables  are  useful  in  pre¬ 
dicting  outcome  and  should  be  utilized  in  the  more  localized  models. 

By  the  use  of  the  above  methodologies  the  variables  of  ex¬ 
posure  (E),  time-frame  (T),  initial  engagement  range  (I),  and  preparation 
time  (P)  were  chosen  for  the  linear  regression  on  cluster  1.  Tiine-frane 
(T),  countermeasure  (C),  initial  force  ratio  ( IFR ) ,  and  initial  engagement 
range  (I)  were  used  to  model  clusters  9,  10,  and  11.  Table  8  associates 
the  overall  logistic  function  and  each  cluster  with  its  linear  coeffi¬ 
cients  and  [RMS  (error  root  mean  squared). 
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TABLE  8  REGRESSION  COEFFICIENTS  FOR  "OVERALL"  LOGISTIC  FUNCTION 
AND  INDIVIDUAL  CLUSTER  REGRESSION  MODELS 

VARIABLE  COEFFICIENTS 


Logi sti c/Cluster 
Model s 

Constant 

E 

T 

C 

IFR 

I 

P  V 

ERMS 

Overal 1 

Logi Stic  Function 

5.94 

-0.60 

-0.96 

0.17 

1.40 

-0.14 

0.97  0.98 

* 

Cluster  #1** 

2.34 

-0.20 

-0.49 

N/A 

N/A 

-0.07 

-0.05  N/A 

0.25 

Cluster  #9 

-2.60 

N/A 

0.69 

0.84 

0.28 

-0.00 

-0.00  N/A 

0.22 

Cluster  #10 

1.59 

N/A 

0.06 

-0.88 

0.03 

-0.00 

-0.00  N/A 

0  .20 

Cluster  #11 

NOTE: 

-1.08 

N/A 

0.04 

0.14 

0.30 

-0.00 

-0.00  N/A 

0.41 

N/A:  Not  applicable 
:  Not  obtained 


For  example,  the  model  for  Cluster  #1  is:  P(ATTW)  =  2.34  -  .20E  -  .49T  -  .071 
- .  0  5P 


Referring  to  Table  8  we  see  that  time-frame  (T)  and  exposure  (E) 
contribute  most  to  the  model  for  cluster  #1.  The  model  for  cluster  #9 
uses  time-frame  (T),  countermeasure  (C),  and  initial  force  ratio  (IFR). 

The  model  for  cluster  #10  heavily  depends  on  countermeasure  (C).  It  is 
interesting  to  note  that  the  coefficients  of  countermeasure  (C)  for  cluster 
#9  and  cluster  #10  have  opposite  signs;  hence  any  change  in  countermeasure 
(C)  affects  outcomes  in  these  clusters  in  opposite  ways. 

The  model  for  cluster  #11  is  the  most  unique.  It  makes  its  pre¬ 
dictions  mainly  on  initial  force  ratio  (IFR)  and  countermeasure  (C). 

2. 2. 3. 3.1  Results  of  Cluster  Analysi s.  It  has  been  shown  in 
Table  7  that  cluster  membership  can  be  used  to  provide  an  estimate  of 
probability  of  an  attacker  win.  This  is  beneficial  when  a  known  initial 
state  vector  falls  within  a  cluster  containing  either  all  wins  or  all 
losses  in  that  modeling  within  that  cluster  was  not  necessary. 

Earlier  it  was  observed  that  when  a  single  response  surface 
(logistic  function)  was  derived  to  represent  the  outcomes  for  state  vectors 
for  all  155  games  in  the  Engineering  Study  data  base,  that  model  produced 


25 


27  mi ^classifications.  By  using  models  developed  for  specific  clusters, 
the  misclassifications  have  been  reduced  to  three,  i.e. ,  a  misclassifi- 
cation  rate  of  less  than  2%.  These  were  produced  by  the  linear  regression 
predictions  for  single  games  in  clusters  10  and  11  and  the  single  loss 
vector  observed  in  cluster  #5. 

Listed  in  Table  9  are  the  state  variables  for  the  three  cases  that 
were  misclassified  after  modeling  within  individual  clusters  was  performed. 


TABLE  9  CASES  CONSIDERED  ANOMALIES  AFTER  CLUSTER  ANALYSIS 


CASE# 

CLUSTER# 

E 

T  C _ 

VIS__ 

t— H 

~n 

TO 

I 

P 

%L0SS  BLUE 

%  LOSS  RED 

02B 

5 

1.0 

2.0  2.0 

2.0 

2.6 

2.0 

1.0 

59 

60 

06A 

10 

4.0 

2.0  1.0 

2.0 

3.7 

2.0 

1.0 

59 

60 

34P 

11 

4.0 

2.0  1.0 

2.0 

6.2 

2.0 

4.0 

51 

60 

It  should  be 

noted  that 

all 

three 

of 

these 

vectors  exhibit  two 

similar  characteri sties  which  may  provide  a  possible  explanation  for  their 
miscl assi fication.  The  first  being  that  all  three  gabies  should  be  considered 
close  with  respect  to  percent  losses.  That  is,  while  the  arbitrary  "loss" 
criteria  in  AMSWAG  defined  a  definite  winner  and  loser,  the  outcome  was,  in 
actuality,  very  much  in  doubt.  A  second  explanation  appears  to  be  related 
to  an  initial  engagement  range  (I)  of  2.0  kilometers.  It  should  be  noted 
that  the  three  anomalies  are  from  the  Engineering  Phase  II  study.  Cases 
02B,  06A,  and  34P  are  similar  in  that  an  attacker  loss  is  expected  but  the 
associated  within  cluster  model  predicts  an  attacker  win.  Each  misclassified 
case  listed  in  Table  10  along  with  the  others  in  its  structured  group  of 
cases  as  evaluated  in  the  Engineer  Study. 

TABLE  10  CASES  CONSIDERED  ANOMALIES  WITHIN  STRUCTURED  DATA 


CASE# 

CLUSTER# 

E 

T 

C 

VIS 

IFR 

_ I___ 

P  %L0SS  BLUE 

%L0SS 

0  IB 

5 

1.0 

2.0 

2.0 

2.0 

2.6 

2.5 

1.0 

60 

44 

02B* 

5 

1.0 

2.0 

2.0 

2.0 

2.6 

2.0 

1.0 

59 

60 

03B 

5 

1.0 

2.0 

2.0 

2.0 

2.6 

1.5 

1.0 

60 

33 

04B 

5 

1.0 

2.0 

2.0 

2.0 

2.6 

1.0 

1.0 

60 

30 

05A 

10 

4.0 

2.0 

1.0 

2.0 

3.7 

2.5 

1.0 

60 

39 

06A* 

10 

4.0 

2.0 

1.0 

2.0 

3.7 

2.0 

1.0 

59 

60 

07A 

10 

4.0 

2.0 

1.0 

2.0 

3.7 

1.5 

1.0 

60 

33 

08A 

10 

4.0 

2.0 

1.0 

2.0 

3.7 

1.0 

1.0 

60 

28 

26 


TABLE  10  (continued) 


CASE# 

CLUSTER# 

E 

T 

C 

VIS 

IFR 

I 

P  XL0SS  BLUE  XL0SS  RED 

33P 

11 

4.0 

2.0 

1.0 

2.0 

6.2 

2.5 

1.0 

60 

54 

34P* 

11 

4.0 

2.0 

1.0 

2.0 

6.2 

2.0 

1.0 

51 

60 

35P 

11 

4.0 

2.0 

1.0 

2.0 

6.2 

1.5 

1.0 

60 

53 

36P 

11 

4.0 

2.0 

1.0 

2.0 

6.2 

1.0 

1.0 

60 

58 

NOTE:  Cases  designated  with  *  are  considered  anomalies. 


Within  the  structured  data  groups,  it  can  be  seen  that  the  3 
mi sclassifications  are  produced  at  an  initial  engagement  range  of  2.0  kilo¬ 
meters.  These  anomalies  are  similar  to  those  found  when  initial  engage¬ 
ment  range  is  compared  to  effectiveness  measures  within  the  Engineering 
Phase  II  study.1  It  was  found  in  that  study  that,  in  general,  as  initial 
engagement  range  decreases,  so  also  does  the  Blue  forces'  effectiveness. 
However,  it  was  also  stated  that  all  Blue  effectiveness  measures  "sawtoothed" 
at  2.0  kilometers,  yielding  the  results  which  we  see  in  our  cluster  modeling. 

3.  SUMMARY/CONCLUSIONS /RECOMMENDATIONS 

In  this  report,  it  has  been  shown  that  the  methodologies  of 
PER,  linear  and  non-linear  regression,  and  later  with  an  expanded  data  base, 
cluster  analyses  could  be  used  to  predict  probabilities  of  win  and  other 
outcomes  of  interest.  It  is  important  to  remember  that  this  was  accom¬ 
plished  based  only  upon  the  initial  state  of  a  wargarne  and,  of  course, 
knowledge  concerning  simulation  outcomes. 

PER  and  linear  correlation  were  instrumental  in  providing  infor¬ 
mation  gain  concerning  input  variables  and  their  relationships  to  the  out¬ 
comes  of  interest. 

The  logistic  function  was  shown  to  be  powerful  in  predicting 
probability  of  win,  especially  with  the  initial  35  case  study,  where  it  was 
perfect.  Later,  when  the  data  base  was  expanded  to  155  cases,  cluster 
analyses  enhanced  the  accuracy  available  from  the  regular  regression  approach. 
Modeling  of  individual  clusters  by  linear  regression  techniques  provided 
better  models  of  more  evenly  split  clusters  (clusters  which  included 
both  wi ns  and  1 osses) . 

Design  and  modeling  of  this  type  provides  a  basis  for:  quanti¬ 
tatively  determini  ng  the  significance  of  individual  factors  and  inter¬ 
actions  on  outcoines  of  interest  in  a  wargarne ;  providing  a  method  for 
critically  comparing  the  results  of  studies  and  influencing  devel op.rental 
efforts. 


The  approach  provides  a  way  for  more  efficiently  using  the 
results  of  a  small  number  of  runs  and  a  consistent  method  for  analyzing  a 
large  number  of  runs. 
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3.1  Conclusions  and  Thoughts  on  Future  Work. 

This  report  briefly  reviewed  how  several  techniques  and  methods 
had  been  used  to  "model"  complex  systems  and/or  simulations.  The  specific 
objective  of  this  report  was  to  apply  these  techniques  to  the  representation 
and  analysis  of  results  obtained  with  AMSWAG,  a  combat  simulation  frequently 
used  at  AMSAA.  The  benefits  and  uses  of  the  approach  were  outlined  and 
specific  results  were  obtained  for  35  and  155  AMSWAG  cases. 

These  runs  had  been  performed  to  support  two  Engineer  Studies. 

Our  analyses  were  meant  to  demonstrate  the  methodology,  with  the  model 
runs  which  were  currently  available.  No  attempt  was  made  to  structure 
and  perforin  additional  runs  due  to  time  and  cost  constraints. 

When  reviewing  the  available  runs  it  became  apparent  that  they 
had'  been  structured  based  upon  the  "one  at  a  time"  approach.  With  this 
method,  all  input  variables  (or  factors)  except  one  are  held  constant 
while  the  remaining  factor  is  cycled  through  its  possible  values.  The 
process  is  then  repeated  for  each  variable.  Tnis  is  done  in  order  to 
gain  some  insight  into  the  effect  on  the  responses  of  interest,  of  changing 
each  variable  individually. 

There  are  several  limitations  to  this  approach.  First,  each 
group  of  runs  provides  information  only  concerning  the  effect  of  the 
one  variable  which  is  being  altered.  If  instead,  all  the  components  in 
the  vector  describing  the  input  to  the  model  had  been  varied  simultaneously, 
information  concerning  the  effects  of  other  factors  could  have  been  extrac¬ 
ted  from  the  runs.  Further,  the  one  at  a  time  approach  is  able  to  assess 
the  effects  of  each  variable  assuming  that  no  interactions  exist,  and  is 
not  able  to  consider  the  interactions  themselves.  (An  interaction  is 
defined  to  exist  between  two  factors  when  the  change  in  response  due  to  a 
change  in  the  level  of  one  factor  depends  upon  the  level  of  the  other 
factor).  For  example,  the  diagram  below  indicates  the  presence  of  an 
interaction  between  factors  A  and  B. 


In  the  above  example,  the  main  effect  of  A  is  of  little  use  in  prediction 
and  we  may  wish  only  to  estimate  the  interaction  AB.  If  we  are  unable  to 
assess  interactions,  as  in  the  case  where  the  "one  at  a  time"  approach  was 
used,  the  study  could  derive  erroneous  conclusions. 
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A  factorial  experimental  design  approach  to  structuring  the  runs 
for  such  an  analysis  overcomes  the  limitations  of  the  "one  at  a  time" 
method.  With  this  approach  information  concerning  the  contributions  of 
interactions  and  main  effects  of  factors  can  be  obtained  and  tested  (with 
equal  precision)  for  their  significance  --  a  procedure  which  has  in  the 
past  been  assessed  only  subjectively.  Thus,  the  factorial  procedure  would 
add  objectivity  to  the  portrayal  of  study  results  and  the  comparison  of 
results  from  study  to  study. 

Future  endeavors  in  this  area  will  concentrate  on: 

(1)  continuous  updating  of  the  data  base  and  revision  of  pre¬ 
dictive  models; 

(2)  participation  in  the  design  of  sets  of  runs  to  be  used 
in  future  AMSWAG  efforts. 

p)  the  development  of  models  which  predict  outcome  based  upon 
the  "state"  of  the  game  at  times  other  than  t0,  i.e.,  dynamic  state. 

There  are  ways  in  which  the  applicability  and  efficiency  of  the 
data  base  might  be  improved.  We  will  now  discuss  several  possibilities 
related  to  (1)  and  (2)  above. 

An  existing  "state  space"  model  can  be  improved  and  broadened 
by  the  inclusion  of  input/output  from  additional  model  runs.  When  the 
results  of  a  new  study  are  added,  predictive  models  for  the  entire  data 
base  can  be  regenerated  rapidly.  Reclustering  and,  if  needed,  new  models 
for  the  clusters  affected  can  also  be  regenerated  efficiently. 

A  limitation  which  arises  in  the  use  of  this  type  of  surrogate 
modeling  occurs  when  the  structure  of  the  simulation  being  modeled 
undergoes  change.  That  is,  as  changes  are  made  to  AMSWAG,  the  surrogate 
models  must  be  updated  and  revalidated.  Revalidation  must  be  undertaken 
when  cluster  membership  is  significantly  affected  and/or  new  clusters  are 
developed.  The  revalidation  methods  would  be  similar  to  those  used  in 
the  text  of  this  report,  that  is,  a  comparison  is  made  of  the  surrogate's 
estimates  with  the  actual  model  results. 

Examples  of  this  situation  could  arise  in  cases  where  force 
constitution  has  changed  or  the  "initial  state"  has  been  altered  to  the 
point  where  no  part  of  the  existing  data  base  is  representative. 

Concerning  item  (3),  when  considering  the  state  as  a  function 
of  time  we  wish  to  construct  a  response  surface  which  gives  the  probabil¬ 
ity  of  a  win  given  the  game  is  in  some  state  at  time  t,  (P ( W/S(t ) ) , 
where  P(W/S(t))  is  the  probability  of  a  win  given  that  S(t)  is  the  state 
of  the  game  at  time  t.  With  such  a  probability  distribution  an  objective 
would  be  to  establish  regions  or  partitions  in  the  S(t)  space  for  which 
P(W/S(t))  =  k,  for  k  =  0,  .1,  .2,  .3,  ...,  1.0.  The  costs  associated 
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with  the  new  developments  or  changes  in  tactics  which  are  required  to  go 
from  one  region  to  another  with  a  higher  probability  of  win  could  then 
be  assessed  and  give  direction  to  future  efforts. 

Significant  work  has  already  been  undertaken  in  this  area. 

The  results  of  these  efforts  will  soon  be  reported. 
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APPENDIX 


PER  METHODOLOGY 

The  PER  methodology  is  a  way  to  assess  the  ability  of  an 
independent  variable  to  predict  a  bivariate  response  (it  is  analogous 
to  the  role  played  by  correlations  when  both  independent  and  dependent 
variables  are  continuous).  For  example,  one  response  to  be  concerned 
with  in  the  war  game  analyses  is  the  outcome  of  the  game.  This  outcome 
may  take  on  a  value  of  1  or  0  depending  on  whether  the  attacking  force 
wi ns  or  1 oses. 

PER  is  based  on  certain  information  theoretic  concepts  and  is 
in  fact  an  acronym.  P  stands  for  the  a  priori  probability  of  outcome  1 
(probability  of  the  attacker  winning). 

Number  of  outcomes  with  response  1 

P  =  _ 

total  number  of  outcomes 

=  relative  percent  of  games  the  attacker  wins.  .  Ex  is  the 
average  information  gain  of  a  variable  x.  It  is  the  amount  by  which  one 
would  change  his  or  her  estimate  of  game  outcome,  given  a  value  of  x.  The 
information  gain  of  a  perfect  predictor  x  is  given  by: 

Ex  =  P(l-P)  +  (l-P)P  or 
Ex  =  2P ( 1-P ) 

This  may  be  seen  in  the  following  way.  Given  no  value  of  X,  the 
best  estimate  of  the  probability  of  a  "1"  outcome  is  P,  the  a  priori  proba¬ 
bility  of  1.  Given  the  value  of  x  a  perfect  predictor,  one  would  move  his 
prediction  from  P  to  1  (a  distance  of  1-P)  for  all  l's  which  occur  P  percent 
of  the  time.  Similarly,  the  prediction  would  move  from  P  to  "0"  (a  distance 
of  P)  for  all  zero  outcomes,  which  occur  (1-P)  percent  of  the  time.  The 
average  distance  the  prediction  is  moved,  given  a  value  of  x,  then  is 
seen  to  be  HP (1-P) . 

When  computing  Ex,  a  partitioning  must  be  established  on  the 
range  of  values  taken  on  by  x.  If  x  is  a  continuous  variable  such  a  par¬ 
titioning  can  be  obtained,  for  example,  by  dividing  the  range  into  a 
number  (usually  5-10)  of  equal  width  lines.  If  x  is  discrete  and  takes  on 
only  a  small  number  of  values,  bins  may  be  defined  on  the  range  of  x, 
which  can  be  indexed  from  1  to  K.  Within  a  bin  we  can  count  the  number  of 
observations  of  x  which  were  associated  with  outcomes  of  "1".  The 
probability  of  a  "1"  outcome  occurring  in  the  jth  bin,  j  =  1,  2,  3,  ...,  k, 
i  s: 


PC'l'Vj) 


Number  of  "1"  outcomes  associated  with  x  value  in  bin  j 


Number  of  outcomes  in  bin  j 
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The  frequency  with  which  observations  fall  into  the  jth  bin  is: 


Number  of  observations  of  x  in  the  jth  bin 


Total  number  of  observations  of  x 

hence, 

k 

Ex  =  l  fj  | P-P ( " 1 " / j ) | 
j=l 

It  should  be  noted  that  information  gain,  Ex,  depends  upon  the 
probability  of  the  attacker  winning  P.  Previously,  it  was  shown  that  Ex 
for  a  perfect  predictor  x  was  2P ( 1 -P) .  Thus,  for  any  variable  x,  its  Ex 
is  1  imited  by  the  range:  0  <_  Ex  _<  2P ( 1-P) . 

The  relative  information  gain,  R  in  the  PER  acronym  is  a 
normalized  version  of  Ex  which  removes  the  dependency  of  P. 

For  a  variable  x,  the  relative  information  gain  Rx,  is  defined 
as: 


Ex 

Rx  =  - 

2P ( 1-P ) 

Rx  represents  the  predictive  capability  of  the  variable  x  relative  to  the 
perfect  predictor  P.  It  should  be  noted  that  0  _<  Rx  _<  1  and  the  greater 
the  value  of  Rx  the  more  useful  is  x  alone  at  making  predictions  concerning 
outcome. 


A  few  other  comments  should  be  made  concerning  PER.  First,  in 
those  cases  when  outcome  is  a  bivariate  one,  the  use  of  a  correlation 
coefficient  is  not  appropriate  in  determining  the  degree  of  relationship 
which  exists  between  any  two  variables.  PER,  therefore,  provides  an 
attractive  alternative.  Secondly,  it  is  well  known  that  correlation 
measures  the  degree  of  linear  relationship  that  exists  between  two 
variables.  At  times,  it  is  probable  that  two  variables  will  exhibit  a  low 
or  zero  correlation,  and  yet  are  perfectly,  but  non-1 inear ly ,  related. 
Therefore,  if  a  low  correlation  with  outcome  is  observed  with  respect  to 
an  independent  variable,  it  could  wrongly  be  excluded  from  further  analyses. 
PER  has  no  such  limitation;  it  identifies  such  a  variable  as  a  useful 
predictor  of  outcome. 


1 
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